Subject: Distributed Database Systems Q1) Map Reduce and Distributed Databases

نویسندگان

  • Jozsef Patvarczki
  • D. Abadi
  • A. Silberschatz
  • A. Rasin
چکیده

Map Reduce (Hadoop) is a popular framework for conducting data processing in a parallel manner. It requires us to take the data "out of" the database in order to process it efficiently, yet promises to achieve high-performance and scalable data processing by exploiting the power of a compute cluster (cloud) with ease. More recently, a marriage of MapReduce with DBMS Technologies has been touted as the new approach towards achieving both performance on the one hand as well as scalability and flexibility on the other hand. One example of such a hybrid architecture is HadoopDB by Abouzeid et al. [1].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scheduling In Distributed Systems

In this paper we take a look at three decades of the history and evolution of distributed data processing systems. We consider how allocation of resources to tasks, i.e. scheduling, is done in three systems and study how system goals influence scheduler design and possible scheduling optimizations. We start by examining the first general-purpose distributed computing system Condor [1], continue...

متن کامل

Using Map and Reduce for Querying Distributed XML Data

Semi-structured information is often represented in the XML format. Although, a vast amount of appropriate databases exist that are responsible for efficiently storing semistructured data, the vastly growing data demands larger sized databases. Even when the secondary storage is able to store the large amount of data, the execution time of complex queries increases significantly, if no suitable...

متن کامل

Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Framework

With the rapid growth of information technology and in many business applications, mining frequent patterns and finding associations among them requires handling large and distributed databases. As FP-tree considered being the best compact data structure to hold the data patterns in memory there has been efforts to make it parallel and distributed to handle large databases. However, it incurs l...

متن کامل

International Journal of Pure and Applied Research in Engineering and Technology a Path for Horizing Your Innovative Work a Review on Virtual Database System Using Map-reduce Technology

\ Abstract: Data Integration in the cloud and grid computing is playing very important role in many applications and research. Many algorithms and systems are designed and developed to address these issues. Virtual database systems are one of the effective solutions for data integration. The existing solutions to design virtual database systems are not so effective. Map Reduce is a computing mo...

متن کامل

Distributed Databases

A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (distributed DBMS) is the software system that permits the management of the distributed database and makes the distribution transparent to the users [1]. The term Òdistributed database systemÓ (DDBS) is typically used to refer ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010